Ensure all state recovery changes are serialized #213
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The original implementation for
StateRecoverySubject
serialized the changes only onDispose
. This PR changes these semantics to instead serialize all changes as soon as they are received by the subject.Motivation
The purpose of
StateRecoverySubject
is to allow persistence of environment logic state when the workflow needs to be stopped for maintenance or in case of an unhandled exception. To minimize file IO, the previous implementation only serialized the state on subject disposal, which under normal situations will always happen either on successful or exceptional termination of the workflow.However, this does not account for situations where the process itself might terminate abnormally in a non-recoverable way, e.g. BSOD, forceful termination of the process by either the OS or the user, or stack overflow exceptions.
Proposed Design
This PR modifies the behavior of
StateRecoverySubject
by ensuring that all changes to the persistent state are serialized to disk immediately upon reception of the notification.Drawbacks
This proposal will significantly increase the pressure on disk for high-frequency state changes. The current implementation is not optimized for streaming, so files are deleted, overwritten and flushed for each new value write. There is also no scheduling mechanism, so writes are synchronized with notifications, i.e. the stream blocks while the state is fully flushed out to disk, which might also slow down the sequence that is pushing the changes.
There is also still the possibility that state is corrupted anyway, if forceful termination of the process happens during one of these disk writes (the file might get corrupted in this case). Again, this would be more likely for high-frequency streams, but could happen with any state change.
Unresolved Questions